Soft-Supervised Learning for Text Classification
نویسندگان
چکیده
We propose a new graph-based semisupervised learning (SSL) algorithm and demonstrate its application to document categorization. Each document is represented by a vertex within a weighted undirected graph and our proposed framework minimizes the weighted Kullback-Leibler divergence between distributions that encode the class membership probabilities of each vertex. The proposed objective is convex with guaranteed convergence using an alternating minimization procedure. Further, it generalizes in a straightforward manner to multi-class problems. We present results on two standard tasks, namely Reuters-21578 and WebKB, showing that the proposed algorithm significantly outperforms the state-of-the-art.
منابع مشابه
Comparison of Supervised and Semi- Supervised Fuzzy Clusters in Text Categorization
Electronics gadgets are part of human life in these days, as a result abundant data is generated and it is growing in exponential rate. Data Generated was earlier stored in dumped repositories. The paper attempts in proposing a classified repository so that at later retrieval of stored data or navigation becomes easy. In the present paper comparison between supervised and semi-supervised classi...
متن کاملEmotion Detection in Persian Text; A Machine Learning Model
This study aimed to develop a computational model for recognition of emotion in Persian text as a supervised machine learning problem. We considered Pluthchik emotion model as supervised learning criteria and Support Vector Machine (SVM) as baseline classifier. We also used NRC lexicon and contextual features as training data and components of the model. One hundred selected texts including pol...
متن کاملON SUPERVISED AND SEMI-SUPERVISED k-NEAREST NEIGHBOR ALGORITHMS
The k-nearest neighbor (kNN) is one of the simplest classification methods used in machine learning. Since the main component of kNN is a distance metric, kernelization of kNN is possible. In this paper kNN and semi-supervised kNN algorithms are empirically compared on two data sets (the USPS data set and a subset of the Reuters-21578 text categorization corpus). We use a soft version of the kN...
متن کاملCombining Unigrams and Bigrams in Semi-Supervised Text Classification
Unlabeled documents vastly outnumber labeled documents in text classification. For this reason, semi-supervised learning is well suited to the task. Representing text as a combination of unigrams and bigrams has not shown consistent improvements compared to using unigrams in supervised text classification. Therefore, a natural question is whether this finding extends to semi-supervised learning...
متن کاملText classification from unlabeled documents with bootstrapping and feature projection techniques
Many machine learning algorithms have been applied to text classification tasks. In the machine learning paradigm, a general inductive process automatically builds a text classifier by learning, generally known as supervised learning. However, the supervised learning approaches have some problems. The most notable problem is that they require a large number of labeled training documents for acc...
متن کاملTowards Multi Label Text Classification through Label Propagation
Classifying text data has been an active area of research for a long time. Text document is multifaceted object and often inherently ambiguous by nature. Multi-label learning deals with such ambiguous object. Classification of such ambiguous text objects often makes task of classifier difficult while assigning relevant classes to input document. Traditional single label and multi class text cla...
متن کاملذخیره در منابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
عنوان ژورنال:
دوره شماره
صفحات -
تاریخ انتشار 2008